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ABSTRACT 

In  limited  data  tomography,  with  applications  such  as  elec¬ 
tron  microscopy  and  medical  imaging,  the  scanning  views 
are  within  an  angular  range  that  is  often  both  limited  and 
sparsely  sampled.  In  these  situations,  standard  algorithms 
produce  reconstructions  with  notorious  artifacts.  We  show  in 
this  paper  that  a  sparsity  image  representation  principle,  based 
on  learning  dictionaries  for  sparse  representations  of  image 
patches,  leads  to  significantly  improved  reconstructions  of  the 
unknown  density  from  its  limited  angle  projections.  The  pre¬ 
sentation  of  the  underlying  framework  is  complemented  with 
illustrative  results  on  artificial  and  real  data. 

Keywords:  Limited  angle  tomography,  sparse  representations, 
regularization 

1.  INTRODUCTION 

Tomography,  with  applications  such  as  electron  microscopy, 
medical  imaging,  and  industrial  non-destructive  testing,  refers 
to  the  recovery  of  the  density  distribution  inside  the  body  from 
its  given  projections.  We  are  primarily  interested  in  the  class 
of  tomography  which  can  be  modeled  by  the  Radon  trans¬ 
form.  In  limited  data  tomography ,  data  are  collected  over 
an  angular  range  that  is  either  limited  (due  to  physical  con¬ 
straints)  or  sparsely  sampled  (e.g.,  due  to  cost  savings  or  ra¬ 
diation  limitations).  The  use  of  standard  reconstruction  algo¬ 
rithms,  such  as  filtered  back-projection  (FBP)  methods,  pro¬ 
duces  reconstructions  with  notorious  artifacts,  see  Figure  1. 

In  dealing  with  the  ubiquitous  limited  angle  tomography, 
several  approaches  have  been  tested  (e.g.,  see  [1,  2,  3, 4,  5]  for 
more  recent  ones).  In  terms  of  artifacts,  methods  that  apply 
regularization  in  the  image  (density)  domain  show  higher  de¬ 
grees  of  success.  Nevertheless,  they  normally  assume  piece- 
wise  smoothness  of  the  unknown  image  and  are  still  vulnera¬ 
ble  to  the  artifacts,  unless  a  “reference  image”  is  used,  which 
is  not  always  available. 

For  the  sparse  angular  sampling  problem,  total  variation 
(related)  methods  have  been  shown  to  be  very  promising, 
mainly  when  applied  to  piecewise  constant  images  [1,  6,  7]. 
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Fig.  1.  Reconstructions  of  the  phantom  in  Figure  2  from  just 
11  noiseless  projections  extended  over  120  degrees.  FBP 
(left),  a  total  variation  based  method  (center),  and  the  results 
by  our  proposed  method  ( right). 

It  was  shown  in  [6]  that  if  the  unknown  signal  is  sparse  in 
some  given  representation  (e.g.,  in  some  vector  space),  then 
it  can  be  accurately  recovered,  with  high  probability,  from 
only  a  few  (random)  measurements.  An  example  is  the  exact 
recovery  of  a  piecewise  constant  image  like  the  Shepp-Logan 
Phantom  (which  has  only  a  few  non-zero  gradients)  from  only 
a  few  projections.  In  practice,  images  in  real  applications  are 
seldom  piecewise  constant,  and  therefore  finding  an  efficient 
representation  “basis”  in  which  the  unknown  image  is  sparse 
remains  an  open  problem.  Here  we  address  this  issue  from  a 
practical  viewpoint,  by  first  considering  small  image  patches. 
We  show  that  by  assuming  sparsity  of  the  patches  with  respect 
to  a  basis  that  in  turn  is  being  learned  (following  [8]),  we 
can  reconstruct  images  that  cannot  be  efficiently  recovered  by 
these  TV-based  methods,  see  Figure  1 .  We  should  add  that  the 
theoretical  results  in  [6]  do  not  address  the  important  case  of 
the  missing  wedge,  meaning  that  not  only  the  data  is  sparsely 
sampled  but  also  a  continuous  range  of  projections  is  missing 
(often  about  30%  of  the  total  range). 

2.  SPARSITY  MODELS  IN  TOMOGRAPHY 

2.1.  Sparsity  representation  of  patches 

The  present  work  is  motivated  by  the  image  processing  suc¬ 
cess  of  the  Sparseland  model  for  signal  recovery  problems 
[9] .  For  signals  from  a  class  T  c  Rn  ,  this  model  suggests  the 
existence  of  a  specific  redundant  dictionary  D  G  RN  x  K  that 
contains  K  atoms ,  such  that  for  any  signal  x  G  T,  there  exists 
a  sparse  linear  combination  of  atoms  from  D  that  approxi- 


mates  it  well.  More  formally,  this  means  that  \/x  G  T,  3  a  G 
such  that  x  ~  Da  and  ||a;||0  AA 

D  can  be  predefined  (such  as  wavelets)  or  learned  (e.g.,  by 
the  K-SVD  algorithm  [8]),  as  in  this  work.  Due  to  its  highly 
effectiveness  for  tasks  such  as  image  denoising,  demosaicing, 
and  inpainting,  in  particular  when  the  dictionary  is  learned  [9, 
10],  here  we  extend  this  idea  to  tomographic  reconstruction. 

To  make  this  framework  practical,  the  Sparseland  model, 
like  many  other  image-domain  regularization  methods,  con¬ 
siders  the  processing  of  small  overlapping  image  patches,  i.e., 
such  patches  are  the  ones  that  admit  a  sparse  representation. 
Assume  that  the  patches  are  of  size  yfn  x  y/n  (i.e.,  n  pixels  in 
each  patch),  then  the  idea  is  that  a  patch  Mx  can  be  approxi¬ 
mated  by  Da ,  where  M  is  a  n  x  N  binary  matrix  that  extracts 
the  patch  from  the  image  x,  D  G  RnxK  is  the  learned  dictio¬ 
nary,  and  a  G  RKx  J ,  with  J  being  the  number  of  patches. 
The  goal  then  is  to  learn  D  such  that  a  is  sparse  and  x  is  ef¬ 
ficiently  and  accurately  reconstructed  by  the  joint  sparse  rep¬ 
resentation  of  all  its  corresponding  overlapping  patches.  We 
now  briefly  present  the  framework  while  revisiting  the  appli¬ 
cation  of  this  model  to  image  denoising. 


2.2.  Image  denoising  model 

In  [9],  the  authors  considered  the  classical  model  for  image 
degradation,  y  =  x  +  w,  where  x  G  RN  is  the  clean  image, 
w  G  Rn  is  assumed  to  be  white  Gaussian  noise  with  a  fixed 
standard  deviation  a  (the  case  of  non-uniform  a  is  dealt  with 
in  [10]),  and  y  G  RNis  the  noisy  observed  image.  The  energy 
minimization  formulation  corresponding  to  the  simultaneous 
learning  of  D ,  computation  of  a ,  and  restoration  of  x,  is 


\a,D,x  \  =  arg  min  ]  A  \\x  -  y\\22  +  V  fij  ||oj||0  + 

^  )  a,U,x  I  — 


3  =  1 
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where  A  and  the  fij  ( j  =  1 , . . . ,  J)  are  positive,  and  the  columns 
of  a  G  RKxJ  are  the  coefficient  vectors  ay  (j  =  1, ...,  J),  in 
a  way  that  the  j-th  patch  MjX  is  approximated  by  Day.  The 
vector  x  G  RN  is  an  estimate  of  the  true  image.  The  first  term 
in  (1)  enforces  the  matching  to  the  data.  The  second  and  the 
last  terms  provide  regularization,  considering  that  the  solution 
has  a  sparse  representation  for  every  overlapping  patch  over 
the  learned  dictionary  D.  Details  on  the  optimization  of  this 
variational  formulation  are  presented  below. 


2.3.  Image  reconstruction  model:  The  tomography  case 

For  reconstruction  in  limited  angle  tomography,  we  model  the 
measurement  y  G  M7  as  y  =  Rx-\-w,  where  R  G  M/xAr  is  the 
(discrete)  Radon  transform  (I  projections),  and  w  G  M7  is  the 
noise,  which  causes  (in  general  non-uniform)  uncertainties  in 


the  pixel  (or  voxel,  if  in  3D)  intensities  of  the  reconstructed 
image.  If  the  reconstruction  process  is  linear,  these  uncertain¬ 
ties  can  be  estimated  (see,  e.g.,  [11]).  This  however  is  not 
the  case,  if  we  attempt  to  use  the  regularization  in  (1),  due 
to  the  non-linearity  introduced  by  the  operation  ||-||0,  which 
affects  the  (non-deterministic)  a.  Nevertheless,  assuming  a 
deterministic  a  and  uniform  noise,  we  can  restore  linearity 
and  compute  a  first  order  approximation  to  these  uncertain¬ 
ties.  It  turns  out  that  the  largest  uncertainties  occur  mainly  at 
the  boundaries  within  the  first  few  pixels,  whereas  in  the  in¬ 
terior  they  are  relatively  uniform.  As  a  result,  we  propose  to 
solve  the  tomographic  reconstruction  problem  via 


(a,D,xj  =  arg  min  ]  A  \\Rx  -  yf2  +  E^'  IK' llo  + 

’X  l  i=1 
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subject  to  the  condition  of  known  boundary  within  a  few  pix¬ 
els,  due  to  the  large  uncertainties  there.  Note  that  while  the 
measurements  are  in  the  Radon  domain  (first  term  in  (2)),  the 
imposed  sparsity  and  learned  dictionary  are  in  the  image  (den¬ 
sity)  domain. 


2.4.  Optimization  algorithm 

The  proposed  algorithm  follows  [8],  and  consists  of  steps  of 
the  K-SVD  algorithm  (downloaded  from  www.cs.technion.ac. 
il/~elad/)  for  image  denoising,  alternated  with  a  Radon  inver¬ 
sion  algorithm.  Let  d±, ...,  dx  be  the  columns  of  D,  which 
are  the  atoms  in  the  dictionary.  Let  ake  MJ  denote  the  k- 
th  row  of  a.  The  K-SVD  algorithm  for  denoising  of  gray 
scale  images  essentially  minimizes  the  objective  function  in 
(1)  with  respect  to  the  “coordinates”  aj(j  =  1, ...,  J),  dk(k  = 
1, ...,  K ),  ak(k  =  1, ...,  K ),  and  x.  First  (the  sparse  coding 
step),  the  minimization  is  carried  out  with  respect  to  each  ay 
( j  =  1 , . . . ,  J)  via  standard  orthogonal  matching  pursuit  (find 
the  best  atoms  from  the  given  dictionary),  and  then  (the  dic¬ 
tionary  update  step)  the  algorithm  optimizes  the  last  term  of 

(1)  with  respect  to  dj~  and  ak  simultaneously  (where  only  the 
nonzero  components  of  ak  are  considered,  improve  via  SVD 
the  atom  for  all  the  patches  that  selected  it  in  the  sparse  cod¬ 
ing  step),  for  each  k  =  1, ...,  K.  This  process  is  repeated 
a  few  times,  until  D  and  a  have  been  learned.  Finally  (the 
averaging  step),  the  objective  function  is  minimized  with  re¬ 
spect  to  x  (this  is  a  simple  weighted  average  of  the  overlap¬ 
ping  patches).  Here  we  skip  the  averaging  step,  optimizing 

(2)  with  respect  to  the  image  x,  which  is  a  simple  quadratic 
optimization  solved  using  Matlab.  There  is  a  critical  “sparsity 
threshold”  e  in  the  KSVD  algorithm:  higher  (lower)  6  induces 
fewer  (more)  atoms  to  approximate  a  patch  in  the  sparse  cod¬ 
ing  step.  It  is  determined  by  the  pixel  uncertainties  that,  as 
discussed  above,  in  the  case  of  tomography  are  difficult  to 


Fig.  2.  128  x  128  mathematical  phantom  with  a  fore¬ 

ground  that  has  a  uniform  density  (left,  Phantom  A),  and 
checkerboard-like  patterns  (right,  Phantom  B). 

compute.  Therefore,  we  set  5  as  a  free  parameter.  See  [10] 
for  more  discussion  about  this  parameter. 

3.  EXPERIMENTAL  RESULTS 

In  the  experiments  presented  below,  we  used  (see  (2))  a  La¬ 
grange  multiplier  A  =  1,  size  of  the  patches  n  =  25,  size 
of  the  redundant  dictionary  K  =  64,  and  known  boundary 
values  up  to  three  pixels  thick.  The  initial  dictionary  was  the 
Discrete  Cosine  Transform. 

3.1.  Simulated  data 

We  created  two  mathematical  2D  phantoms.  Both  phantoms 
contain  four  primitives,  with  constant  density  in  the  first  one, 
called  Phantom  A,  and  checkerboard-like  patterns  in  the  sec¬ 
ond  one,  called  Phantom  B.  See  Figure  2.  The  former  is  bi¬ 
nary  with  background  intensity  0.1  and  foreground  intensity 
1;  and  the  latter  has  a  minimal  intensity  -1  and  maximal  in¬ 
tensity  1 .  On  a  gray  scale,  black  represents  the  minimum  and 
white  represents  the  maximum. 

The  simulated  data  sets  consisted  of  11  projections  ex¬ 
tended  uniformly  over  two  thirds  of  the  standard  (e.g.,  in  elec¬ 
tron  tomography)  full  180°  range.  Typically,  when  standard 
reconstruction  algorithms  (such  as  the  FBP)  are  used,  many 
more  (the  order  of  hundreds)  projections  extending  over  the 
full  angular  range  are  required.  We  considered  both  perfect 
and  noisy  projection  data.  The  noise  in  the  measurements 
was  independent  Gaussians  with  unit  standard  deviation. 

For  comparisons,  we  performed  the  reconstructions  using 
a  FBP  method,  a  TV-based  method  (see  Appendix),  and  the 
proposed  approach.  See  figures  1  and  3  .  Note  that,  because 
of  the  missing  angular  range,  which  is  in  the  horizontal  di¬ 
rection,  edges  along  this  direction  are  notoriously  difficult  to 
recover.  We  used  e  =  0.01  and  e  =  0.05,  respectively,  for 
the  Phantom  A  and  Phantom  B.  In  all  cases,  the  number  of 
iterations  was  H  —  1,  000. 

These  preliminary  results  suggest  that  our  proposed  re¬ 
construction  method  outperforms  both  qualitatively  (e.g.,  less 
artifacts  and  more  contrast)  and  quantitatively  a  FBP  method 
and  a  TV-based  method.  See  Table  1,  where  the  estimation 
error  is  defined  to  be  \\x  —  xq  ||2,  with  x  and  xo ,  respectively, 
the  reconstructed  and  the  true  image. 


Fig.  3.  Top  row:  128x128  reconstructions  of  the  Phantom 
A  from  11  noiseless  projections.  FBP  (left),  a  TV-based  al¬ 
gorithm  (center),  and  the  results  by  our  proposed  method 
(right).  Bottom  row:  same  order  but  for  the  Phantom  B  and 
noisy  data.  Note  that  TV-based  minimization  works  well  with 
Phantom  A  (piecewise  constant)  but  not  with  Phantom  B. 


Table  1.  Estimation  error . 


Technique 

A/B  (noiseless) 

A/B  (noisy) 

FBP 

20/28 

18.6/23 

TV  minimization 

4.3/19 

11.1/21.5 

Proposed 

4.8/10.8 

8.7/12.2 

3.2.  Real  data 

We  performed  reconstructions  on221x221  images  from  den¬ 
tal  data  produced  by  the  Focus  intraoral  X-ray  source  and  the 
Sigma  intraoral  sensor  (Instrumentarium  Dental;  courtesy  of 
Maaria  Rantala,  from  PaloDex,  see  [5]  for  more  details).  We 
have  available  23  projections,  uniformly  distributed  over  ap¬ 
proximately  the  full  180°  of  a  section  of  the  third  molar  tooth. 
A  typical  section  has  four  basic  components:  surrounding  air, 
a  layer  of  enamel,  interior  dentin,  and  pulp  (the  two  dark 
holes),  see  Figure  4.  To  produce  satisfactory  results,  it  was 
necessary  only  H  =  100  iterations.  To  test  the  effectiveness 
of  our  method,  we  reconstructed  from  (i)  all  the  23  projec¬ 
tions,  (ii)  15  (out  of  the  23)  consecutive  projections  (leaving 
then  a  wedge  of  uncovered  angles),  and  (iii)  8  (out  of  the  23) 
projections  approximately  uniformly  distributed  over  the  full 
180°.  In  the  first  two  cases,  5  was  lowered  linearly  from  0.005 
to  0.001  in  100  iterations,  and  in  the  case  (iii),  from  0.01  to 
0.001.  Again,  we  observe  from  Figure  4  that  our  approach 
delivers  reconstructions  with  less  artifacts,  without  sacrific¬ 
ing  the  contrast. 

4.  CONCLUSIONS  AND  DISCUSSION 

In  this  work,  we  introduced  the  framework  of  learning  sparse 
representations  for  density  reconstruction  in  limited  angle  to¬ 
mography.  The  reconstruction  algorithm  aims  at  minimizing 


Fig.  4.  Reconstructions  of  a  section  of  a  tooth  from  23  ( top 
row),  15  ( middle  row)  and  8  ( bottom  row)  projections ,  using 
a  FBP  method  ( left  column),  a  total-variation  based  method 
( center  column),  and  our  proposed  method  ( right  column ). 


a  functional,  encouraging  a  sparse  representation  of  the  im¬ 
age  patches  while  keeping  the  data  constraints  provided  by 
the  available  projections.  Preliminary  experimental  results 
from  both  simulated  and  real  data  demonstrated  that  the  pro¬ 
posed  reconstruction  method  outperforms  a  FBP  and  a  TV- 
based  method. 

As  mentioned  above,  the  sparsity-level  6  is  closely  related 
to  the  non-uniform  uncertainty  level,  and  it  was  here  left  as  an 
algorithm  parameter.  Automatically  computing  this  parame¬ 
ter  is  the  subject  of  important  future  research. 

Results  in  the  image  enhancement  literature  have  shown 
that  significant  improvements,  leading  to  state-of-the-art  re¬ 
construction,  can  be  achieved  by  initially  considering  a  dic¬ 
tionary  learned  from  large  databases  from  the  same  data  class 
[8,  10],  which  are  then  adapted  to  the  particular  image.  We 
plan  to  continue  our  line  of  research  in  this  direction,  and  re¬ 
sults  will  be  reported  elsewhere. 

Appendix-  A  TV-based  algorithm:  Regarding  the  implementation 
of  the  TV-based  method,  we  found  that  a  simple  algorithm,  consist¬ 
ing  of  alternating  between  TV-minimization  and  imposing  the  mea¬ 
surement  constraints,  produces  significantly  better  results  than  solv¬ 
ing  a  constrained  TV  minimization  using,  e.g.,  11 -magic  http://www. 
acm.caltech.edu/llmagic/.  The  results  produced  by  the  11 -magic  (us¬ 
ing  the  default  parameters  and  starting  with  the  FBP  reconstruction), 
were  very  similar  to  the  starting  image.  The  algorithm  for  all  the  TV 
reconstructions  in  this  paper  consisted  of  3,000  to  4,000  iterations 
of:  (i)  one  step  of  gradient  descent  for  the  TV  minimization  with 
fixed  step  size,  and  (ii)  five  cycles  of  ART.  It  was  observed  that,  in 
general,  the  reconstructions  did  not  improve  considerably  after  2,000 
iterations  and  the  optimal  step  size  was  between  10“ 3  and  10-4. 
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